# Concepts

## Generic Audio Concepts

![Audio sources, instances, voices, mix busses.](images/concepts)\ 

### Audio Source and Instance


SoLoud uses two kinds of classes for the sounds. Audio sources contain
all the information related to the sound in question, such as wave
sample data, while audio instances contain information about an
"instance" of the sound.

As an analogue, if you think of an old vinyl record, the audio source is
the record, and you can put as many playheads - the instances - on the
record. All of the playheads can also move at different speeds, output
to a different pan position and volume, as well as different filter 
settings.

![Audio sources and back-ends.](images/source_backend)\ 

### Back end


SoLoud itself "only" performs audio mixing, resource handling and bookkeeping.
For it to be useful, it needs one or more sound source and a back end.
Some other audio systems use the term 'sink' for the back-ends (as in
audio source and audio sink).

Examples of back-ends would be winmm, oss, portaudio, wasapi and SDL audio. 
SoLoud comes with several back-ends, and is designed to make back-ends 
relatively easy to implement.

Different back-ends have different characteristics, such as how much
latency they introduce.

### Channel


One audio stream can contain one or more channels. Typical audio sources
are either mono (containing one channel) or stereo (containing two
channels), but surround sound audio sources may practically have any
number of channels.

In module music (such as mod, s3m, xm, it), "channel" means one of the
concurrent sounds played, regardless of speaker configuration. Confusing?
Yes.

### Voice

SoLoud can play audio from several sound sources at once (or, in fact,
several times from the same sound source at the same time). Each of
these sound instances is a "voice". The number of concurrent voices is
limited, as having unlimited voices would cause performance issues, as
well as lead to unnecessary clipping.

The default number of concurrent voices - maximum number of "streams" -
is 16, but this can be adjusted at runtime. The hard maximum number is 
4095, but if more are required, SoLoud can be modified to support more. 
But seriously, if you need more than 4095 sounds at once, you're probably
going to make some serious changes in any case.

If all channels are already playing and the application requests another
sound to play, SoLoud finds the oldest voice and kills it. Since this
may be your background music, you can protect channels from being killed
by using the soloud.setProtect() call.

SoLoud also supports virtual voices, so things are a bit more complicated
- basically you can have thousands of voices playing, but only the most
audible ones are actually played.

### Virtual Voices

SoLoud lets you play way more voices than will actually be played
to the user. This can be useful if you, for instance, populate a
3d world with hundreds of audio sources.

SoLoud will sort the voices based on their current volume level
and only mixes the most audible sounds. The number of active
voices can be set at runtime. Protected voices are always played.

The maximum number of virtual voices is currently 1024, but can
be increased up to a hard limit of 4095 by editing soloud.h and
recompiling. Higher virtual voice counts than that will require 
some refactoring.

### Voice Group


Sometimes it is important to be able to command several voices at the
same time so that they are synchronized; for instance, when cross-fading
between two versions of the same song.

![Problem unpausing two voices. Delay may vary, 40ms used as an example.](images/voicegroups)\ 

Even if you try to unpause the two voices at the same time, it's possible,
due to the multithreaded nature of audio, that the audio engine interrupts
you between these two calls and your sounds get unpaused to different audio
buffers.

SoLoud's solution to this are voice groups. Voice groups can be commanded
the same way as single voices, but instead of affecting just one voice,
SoLoud performs the command on all of the voices in the group in one
atomic operation.

### Clipping


Audio hardware always has a limited dynamic range. If you think of a
signed 16-bit variable, for instance, you can only store values from
-32768 to 23767 in it; if you try to put values outside this range in,
things tend to break. Same goes for audio.

![Results of different clippers.](images/clipping)\ 

SoLoud handles all audio as floats, but performs clipping before passing
the samples out, so all values are in the -1..1 range. There's two ways
SoLoud can perform the clipping; the most straightforward is simply to
set all values outside this range to the border value, or alternatively
a roundoff calculation can be performed, which "compresses" the loud
sounds. The more quiet sounds are largely unchanged, while the loud end
gets less precision. The roundoff clipper is used by default.

The roundoff clipper does, however, alter the signal and thus "damages"
the sound. A more proper way of doing things would be to use the basic
clipper and adjust the global volume to avoid clipping. The roundoff
clipper is, however, easier to use.

### Sample


The real world has continuous signals, which would require infinite
amount of storage to store (unless you can figure out some kind of
complicated mathematical formula that represents the signal). So, we
store discrete samples of signals instead. These samples have
traditionally been 8, 16 or 24 bit, but high-end audio is tending
towards floating point samples.

SoLoud also uses floating point samples internally. First and foremost,
it makes everything much simpler, and second, modern computing devices
(even mobile!) have become fast enough that this is not really a 
performance issue anymore.

Floating point samples also take more space than, for instance, 16 bit
samples, but memory and storage sizes have also grown enough to make
this a feasible approach. Nothing stops the audio sources from keeping
data in a more "compressed" format and performing on-the-fly conversion
to float, if memory requirements are a concern.

### Sample Rate


The sample rate represents the number of samples used, per second.
Typical sample rates are 8000Hz, 22050Hz, 44100Hz and 48000Hz. Higher
the sample rates mean clearer sound, but also bigger files, more memory
and higher processing power requirements.

Due to limitations in human hearing, 44100Hz is generally considered 
sufficient. Some audiophiles disagree, but then again, some audiophiles
buy gold-plated USB cables.

### Hz


Hertz, SI unit of frequency. 0.1Hz means "once per 10 seconds", 1Hz means 
"once per second", 10Hz means "10 times per second", and 192kHz means 
"192000 times per second".

### Play Speed


In addition to a base sample rate, which represents the "normal" playing
speed, SoLoud includes a "relative play speed" option. This simply
changes the sample rate. However, if you replace your sounds with
something that has a different "base" sample rate, using the relative
play speed will retain the effect of playing the sound slower (and
lower) or faster (and higher).

### Relative Play Speed


SoLoud lets you change the relative play speed of samples. Please note
that asking for a higher relative play speed is always more expensive 
than a lower one.

Playing samples at higher or lower sample rate than they're intended
can also cause resampling issues.

### Resampling


SoLoud has to perform resampling when mixing. In an ideal case, all of
the sources and the destination sample rate are the same, and no
resampling is needed, but this is often not true. 

![Different resamplers (Point / linear / catmull-rom). Red is the ideal signal.](images/resampler)\ 

Currently, SoLoud supports "point sample" resampling, which means it simply skips 
or repeats samples as needed, "linear interpolation", which calculates linear 
interpolation of samples, as well as "catmull-rom" which, instead of doing
linear interpolation, uses a catmull-rom spline to interpolate the samples.

The linear interpolation resampler is used by default.

The resampler can be set on a per-bus basis, so if you want to use a higher (or lower) quality
resampler for a certain bus, you can do so. Note that if all sample data has the same
sample rate, the point sample resampler will actually produce the best results, as it
will not affect the signal (in that, and only in that case).

### Pan


Where the sound is coming from in the stereo sound, ranging from left
speaker only to right speaker only. SoLoud uses an algorithm to
calculate the left/right channel volume so that the overall volume is
retained across the field. You can also set the left/right volumes
directly, if needed.

### Handle


SoLoud uses throwaway handles to control sounds. The handle is an
integer, and internally tracks the voice and sound id, as well as an
"uniqueness" value.

If you try to use a handle after the sound it represents has stopped,
the operation is quietly discarded (or if you're requesting information,
some kind of generic value is returned, usually 0). You can also query the validity
of a handle.

### Latency


Audio latency generally means the time it takes from triggering a sound
to the sound actually coming out of the speakers. The smaller the
latency, the better.

Unfortunately, there's always some latency. The primary source of
latency (that a programmer can have any control over) is the size of
audio buffer. Generally speaking, the smaller the buffer, the lower the
latency, but at the same time, the smaller the buffer, the more likely
the system hits buffer underruns (ie, the play head marches on but
there's no data ready to be played) and the sound breaks down horribly.

Assuming there's no other sources of latency (and there quite likely
is), with 2048 sample buffer and 44100Hz playback, the latency is around
46 milliseconds, which is tolerable in most computer game use cases. A 
100ms latency is already easily noticeable. In music terms, when playing
drums, 40ms is too much, but when playing slowly evolving atmospheric pads,
200ms is fine.

### Filter


Audio streams can also be modified on the fly for various effects.
Typical uses are different environmental effects such as echoes or
reverb, or low pass (bassy sound) / high pass (tinny sound) filters, but
basically any kind of modification can be done; the primary limitations
are processor power, imagination, and the developer's skill in digital
signal processing.

SoLoud lets you hook several filters to a single audio stream, as well
as to the global audio output. By default, you can use up to four filters,
but this can be easily changed by editing SoLoud.h file and rebuilding
the library.

SoLoud also support STFT filters, where the samples are converted to
frequency domain for modification and back for playback.

### Mixing Bus


In addition to mixing audio streams together at the "global" level,
SoLoud includes mixing busses which let you mix together groups of audio
streams. These serve several purposes.

The most typical use would be to let the user change the volume of
different kinds of audio sources - music, sound effects, speech. In this
case, you would have one mixing bus for each of these audio source
groups, and simply change the volume on the mixing bus, instead of
hunting down every sound separately.

When using environmental effects filters, you most likely won't want the
background music to get filtered; the easiest way to handle this is to
apply the filters to the mixing bus that plays the sound effects. This
will also save on processing power, as you don't need to apply the
environmental audio filters on every sound effect separately.

It's also possible that you have some very complex audio sources, such
as racing cars. In this case it makes sense to place all the audio
streams that play from one car into a mixing bus, and then adjust the
panning (or, eventually, 3d position) of the mixing bus.

Additional feature of the mixing busses in SoLoud is that you can
request visualization data from a bus, instead of just from the global
scope.

### Queue

SoLoud also contains a special kind of audio source called Queue. This
can be used to queue other audio sources. Once one stops, the next one
starts playing.

This can be useful when chaining sounds, like having a never ending song
by queuing random patterns.
